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Abstract — An 8-Gb/s 0.3-//m CMOS transceiver uses multilevel 
signaling (4-PAM) and transmit preshaping in combination with 
receive equalization to reduce intersymbol interference due to 
channel low-pass effects. High on-chip frequencies are avoided 
by multiplexing and demultiplexing the data directly at the pads. 
Timing recovery takes advantage of a novel frequency acquisition 
scheme and a linear phase-locked loop that achieves a loop 
bandwidth of 35 MHz, phase margin of 50°, and capture range 
of 20 MHz without a frequency acquisition aid. The transmitted 
8-Gb/s data are successfully detected by the receiver after a 10-m 
coaxial cable. The 2x2 mm 2 chip consumes 1.1 W at 8 Gb/s with 
a 3-V supply. 

Index Terms— Clock recovery, multi-level signaling, receiver 
equalizer networks, serial links. 



I. Introduction 

A S THE demand for higher data-rate communication in- 
creases, low-cost high-speed serial links using copper ca- 
bles become more attractive for distances of 1-10 m [ 1 ], [2] . For 
multi-gigabit/second (Gb/s) applications, the data rate is lim- 
ited by the cable skin-effect loss and the process technology. 
The 10-m coaxial cable (PE-142LL) used in this work has a 
-3-dB bandwidth of 1 .2 GHz. This design differs from existing 
Gb/s links [1], [2] in its use of a receiver equalizer in com- 
bination with a transmitter filter to compensate for the cable 
characteristics. High on-chip frequencies are avoided by mul- 
tiplexing and demultiplexing the data directly at the pads. To 
reduce the symbol rate, a four-level pulse amplitude modula- 
tion (4-PAM) is used. A new proportional phase detector for 
data recovery is proposed, which does not suffer from the sta- 
bility and bandwidth limitations of traditional bang-bang loops. 
A novel frequency acquisition architecture enables the receive 
phase-locked loop (PLL) to lock to the input stream under all 
process variations. The focus of this paper is the design and 
implementation of the high-speed link receiver. Details of the 
transmitter architecture are discussed in [5]. 

II. System Architecture 

Implementing truly optimal detection methods (in the infor- 
mation theoretical sense) for multi-Gb/s rates demands high 
complexity and large area [3]. Instead, square pulses, which 
can be generated and detected with modest complexity, are 
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Fig. 3 . Two 4-PAM eye diagrams: (a) slow transition and (b) sharp transition. 

used here as the basis communication symbols [4]. At rates 
well above the channel bandwidth, however, square pulses 
result in severe intersymbol interference (I SI), which reduces 
the data-eye openings. For a given data rate, the 4-PAM scheme 
reduces the symbol rate to half compared to a conventional 
2-PAM system. This symbol rate reduction lowers not only 
the ISI in the channel but also the maximum required on-chip 
clock frequency. 

To invert the channel, a pre-emphasis filter at the transmitter 
and an equalizer at the receiver are used. The pre-emphasis 
transmitter has a two-tap symbol-spaced finite-impulse re- 
sponse (FIR) filter that is used to cancel the tail of the cable 
pulse response for two subsequent symbol intervals [5]. The 
receiver equalizer is a one-tap half-symbol-spaced FIR filter, 
which is described by the following equation: 



Veq(n • t s ) = Vi(n -is) -a • Vi 



(i) 



where ts is the symbol period or sampling interval. This equal- 
izer, using half-symbol-spaced sample values, can equalize the 
signal over a frequency range that is double that of the trans- 
mitter filter without an aliasing effect. Thus the high-frequency 
components of the signal that were not compensated by the 
transmitter filter can be equalized in the receiver. In the time do- 
main, the receiver equalizer sharpens the transition edges of the 
signal. Sharper transition edges result in a larger timing margin 
for signal detection, especially in multilevel signaling systems. 
Fig. 1 shows two eye diagrams for a 4-PAM system with dif- 
ferent slew rates; clearly the eye diagram with sharper transition 
results in a larger eye opening [Fig. 1(b)]. 

The effects of the receiver and transmitter filters for a 0.2-ns 
pulse (5 Gsym/s) at the near and far ends of the 10-m channel 
are shown in Fig. 2. The unfiltered pulse response remains at 
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Fig. 2. Pulse shape with and without filtering. 
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Fig. 3. Multiplexing and demultiplexing the high-speed signal onto the 
transmission line. 



a large value 0.2 ns after its peak (next symbol sample point), 
while the preshaped equalized signal is zero at that point. All the 
filter tap weights can be programmed to accommodate different 
channel characteristics. 

The on-chip frequency requirement is further reduced to the 
symbol rate (1/10 bit rate) by perfonning 5 : 1 multiplexing and 
1 : 5 demultiplexing directly at the chip pads, allowing five sym- 
bols to be transmitted every clock cycle [7]. The abstract view 
of this architecture is shown in Fig. 3. The five symbols corre- 
spond to 10 bits that include four data symbols and one symbol 
for line coding. In this design, coding is performed on-chip to 
guarantee a high enough transition density for clock recovery. 

HI. Circuit Implementation 

The block diagram of the complete transceiver chip is 
depicted in Fig. 4. The transmitter, comprising five identical 
drivers, uses different clock phases from a five-stage differential 
ring oscillator (TX-VCO) to multiplex the data stream onto the 
50-W line. The detailed transmitter design is described in [5]. 

The receiver performs 1 : 5 demultiplexing at its input pads 
by sampling the signal with five out often clock phases from a 
five-stage differential ring oscillator (RX-VCO). The five addi- 
tional alternate clock phases allow 2 x oversampling to recover 
timing and provide required samples for the input equalizer with 
half-symbol-spaced tap spacings. After equalization, the recov- 
ered data samples are converted to bits (binary data) by a bank 
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Fig. 4. Transceiver general architecture. 

of five 2-bit analog-to-digital converters (ADC's). Finally, the 
bits in each pack of 10-bit data are pipelined properly and syn- 
chronized to a global clock. 

A2 7 -l pseudorandom bit sequence (PRBS) encoder and 
decoder, as well as a scannable transmit/receive data register, 
are provided on-chip for bit error rate (BER) testing. The 5/4 
sym decoder removes the extra line-code symbol Fig. 4. As the 
important functions of the receiver are timing recovery, equal- 
ization, and 4-PAM data detection, each of these topics is dis- 
cussed separately in the following sections. 

A. liming Recovery 

Timing recovery uses data transitions to adjust the phase of 
the sampling receiver clocks. There are two main approaches for 
timing recovery from a serial data: oversampling data recovery 
and tracking phase detection. In the oversampling technique, 
each transmitted symbol is sampled N times (N > 3), and the 
sample that is closest to the symbol center is selected by logic 
as the data [7]. This approach allows very fast timing recovery 
but suffers from large input loading (due to the large number of 
samplers) and phase quantization error. Furthermore, it requires 
complex logic to process many samples at high frequency. 

In the tracking phase detection technique, a data phase 
detector measures the phase difference between the transition 
edge of the transmitted symbol and the sampling clock. This 
error value is used to align the sampling point at the symbol 
center. Traditional proportional tracking data PLL's offer good 
loop stability and bandwidth, but most suffer from a systematic 
phase offset. Sampling transitions by the same mechanism as 
the symbol centers reduces the systematic phase offset in data 
recovery. However, conventional sampling digital loops use 
bang-bang control, resulting in limited bandwidth and stability 
[6].^ In this work, we have designed a novel proportional 
tracking phase detector to overcome these problems. 

Fig. 5 shows the receiver 2x oversampling front end that is 
part of the phase detection scheme. When the receive PLL is 
locked properly to the input data, half of the ten samples repre- 
sent symbol values at the center of the symbols (5 C ), and half are 
samples at the data transitions (5 e ). The S c samples are digitized 
by 2-bit flash ADC's and result in the received data bits that are 
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Fig. 6. . Proportional tracking phase detection method (sampling clocks lags 
the data). 

next resynchronized to a global clock. The 5 C samples are am- 
plified by linear amplifiers and kept as analog values (Se) that 
are used as part of a linear phase detector for timing recovery. 

Fig. 6 illustrates the proposed phase detection method for a 
special case of two-level data and lagging sampling clock. Ar- 
rows in the figure show the clock sampling points only at symbol 
boundaries. When the loop is not in lock and a transition occurs, 
the edge samples are nonzero and a monotonic function of the 
phase difference between sampling clock edge and data 
zero crossing. This function can be approximated by a linear 
function, when the sampling edge occurs within the data tran- 
sition interval and the loop is near its locking point. Thus, for 
edge samples within this interval of interest, we have 



(2) 



where k is the slope of the transition edge. The S e values are 
added together with correct polarity, determined by the direction 
of each transition, and used to adjust the loop control voltage to 
correct for the phase error. As the correction on the loop control 
voltage is proportional to the phase error, this method results 
in a proportional loop control. Therefore, this PLL combines 
the advantages of both a linear and a sampling loop. Also, the 
analog edge samples (S e ) at transitions are zero when in lock, 
resulting in zero sum voltage (no ripple) on the loop control 
line. In bang-bang control, fixed-amplitude correcting pulses 
are always applied to the control line that result in ripple and, 
hence, timing error. 

Note that in a differential 4-PAM stream, there are three dis- 
tinct transition types (Fig. 7). Of these three types, only typel 
makes a transition to the same magnitude but opposite polarity, 
which results in a zero crossing that occurs exactly at the mid- 
point between two symbols and that therefore can be used for 
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Fig. 8. Proportional data phase detector architecture. 

clock recovery. The two other types are ignored as they convey 
wrong phase information. In every cycle (five symbols), one 
typel transition is guaranteed by the transmitter's 4/5 sym en- 
coder. 

Fig. 8 shows the block diagram of the data phase detector that 
performs the proposed phase detection technique. The five am- 
plified analog edge samples Se are each fed into a decision logic 
block of the phase detector. Based on the two symbol values be- 
fore and after the transition (2-bit data from ADC's, e.g., dOdl), 
the phase detector adds the Se values of typel with correct po- 
larity to the control voltage of the loop and ignores the other two 
types of transitions by turning off all the switches of that stage. 
The add/subtract function is done by current summing the dif- 
ferential analog samples with correct polarity at the output of 
the phase detector (VJ>h)- 

A charge pump (Fig. 8) converts V p u to a proportional cur- 
rent using a differential voltage-to-current converter (V-I), as. 
shown in Fig. 9. The voltage offsets in the charge pump and 
phase detector stages directly translate into a phase offset be- 
tween the sampling clocks and input data. Random offset due to 
transistor mismatches is reduced by increasing the device sizes 
and careful layout. The systematic offset of the charge pump 
(V-I) is cancelled using an offset calibration loop that forces 
the charge pump to inject zero net charge (current) into the loop 
filter when differential V ph = 0, as shown in Fig. 9. The calibra- 
tion circuit has an exact replica of the main V-I, whose inputs 
are tied together and set equal to Vp^ common-mode voltage 
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Fig. 1 0. Frequency acquisition loop for data phase detector. 

(Fphcom). The replica V-I that has a capacitor at output acts as 
a charge integrator. Thus, the source (J up ) and sink (idn) cur- 
rents should be exactly equal to avoid charging the replica cir- 
cuit output to either of the supply rails. The two Y up and 7 d n 
currents are forced equal by a differential comparator and a cur- 
rent trimming circuit combination (Fig. 9) that compares the 
replica output to the loop control voltage (Ku) and makes the 
replica output equal to V ct \ by trirnming J up and Jd n . A replica 
of the trim currents is applied to the main V-I. As a result, the 
loop charge pump generates equal I up and Jd„ when its differ- 
ential inputs are equal, i.e., when Vph — 0, or Vph+ = V pll - — 

To make loop dynamics (gain, bandwidth, phase margin) 
track process variations and frequency of operation, the loop 
filter design proposed in [8] is used. As there is no ripple on the 
loop control voltage when in lock, owing to the phase detector 
architecture, the loop filter does not require a third-order pole 
capacitor to damp the control voltage ripple. Therefore, the 
loop theoretically has only two poles and one zero, and is 
stable for an infinite range of bandwidths (BW > Aero) and 
loop gains. However, the capacitive loading of the VCO stages 
on the loop control line introduces a third-order pole that can 
make the loop unstable for very large gains. The loop gain, 
and consequently the bandwidth, increases with the number of 
useful (type!) transitions per cycle, and the slew rate of input 
data signal [k in (2)]. Using the 4/5 sym encoder, the density of 
typel transition varies from a minimum of one to a maximum 
of fivetransitions per clock cycle. The input slew rate (fc) is de- 
termined by the signal amplitude and transition time, which is 
limited by channel bandwidth. Hence, the loop parameters are 
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Fig. 1 1 . Frequency monitor: (a) top view and (b) edge detector. 

chosen carefully to guarantee a loop bandwidth of >20 MHz 
and a phase margin >45° at the worst operating conditions 
(lowest and highest loop gains). The loop is optimized for a 
random data sequence with an average type] transition density 
of two per cycle, a differential input amplitude of 1 V (500 mV 
single-ended), and a risetime of 200 ps. In this condition, the 
loop has a bandwidth of 35 MHz (BW/f Tfti > 0.07) and phase 
margin of 50°. 

As the phase detector has a limited frequency capture range, a 
frequency acquisition aid is employed to help acquire lock to a 
local reference clock at startup (Fig. 10). When the Rx-VCO 
frequency is different from that of the incoming data, cycle 
slipping occurs. During cycle slipping, sweeping of the clock 
phase causes the phase detector output (Vp h ) to oscillate be- 
tween early and late signals. The frequency of this oscillation 
(sweep speed) is equal to the frequency difference between the 
receive clocks and the mcorning data. A frequency monitor cir- 
cuit activates the frequency acquisition loop if the frequency dif- 
ference is large, and activates the data recovery loop (deactivates 
frequency acquisition) when this difference is smaller than the 
capture range of the PLL. Fig. 1 1(a) shows the top view of the 
frequency monitor circuit. If there is a considerable frequency 
difference, the oscillations at the phase detector output (VhO 
cause the edge detector to produce pulses that continuously dis- 
charge C 0 ne and keep the one-shot circuit output (V one ) at zero. 
Once the VCO frequency is close enough to the mcorning data 
frequency (within the data PLL capture range), the pulse rate 
of the edge detector decreases such that C one can charge high 
enough to switch V one to one. At the rising edge of V one , Vq, 
which is reset to zero at startup, is asserted and hands loop con- 
trol over to the data phase detector. The edge detector is de- 
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Fig. 13. (a) 2-b differential flash ADC and (b) differential preamplifier with 
one reference voltage. 

signed to have hysteresis [Fig. 1 1(b)], using positive feedback 
in its first stage amplifier. Thus, it reacts only to oscillation am- 
plitudes larger than a certain threshold level, which helps pre- 
vent erroneous transitions due to noise. 

B. Equalization 

Since equalization has to be performed at a very high fre- 
quency (symbol rate) on each data sample, speed limitations of 
the process make it impractical to implement this equalizer as a 
digital FIR filter. Thus, equalization is performed in the analog 
domain directly on the sampled values before they are used 




0.2 
time (ns) 



Fig. 14. Simulated eye diagrams at 10 Gb/s for 10-m coaxial cable: (a) no 
filtering, (b) transmit emphasis, and (c) receiver equalization and transmit 
preemphasis. 

by other blocks. Fig. 12 shows the architecture of the one-tap 
half-symbol-space equalizer, where receiver 2x over sampling 
provides the required samples. 

Having the present and former differential samples, the equal- 
izer subtracts the weighted value of the former sample from the 



762 



IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000 




Transmitter 



sops/div 



(a) 



If 



SOOftV 

/div 






SOpa/div 



(b) 



Fig. 1 5. Differential data-eye over 1 0-m cable with preemphasis (a) at 1 0 Gb/s 
and (b) at 8 Gb/s* 



present sample. This operation is done by current summing two 
differential values with opposite polarity, as shown in Fig. 12. 
The weighted current is controlled by two tail NMOS transistors 
that act as a resistor and should therefore operate in the triode 
region. The equalizer further improves the data eye area (height 
x width) up to 40% by sharpening the signal transitions. 

C. Data Detection 

To convert the four-level analog symbols into digital bits, five 
2-bit flash ADC's are implemented after the receiver 1 : 5 de- 
multiplexer (Fig. 1 3). Each ADC consists of three preamplifiers 
and regenerative latches, followed by a gray coder that con- 
verts thermometer code into binary code, as shown in Fig. 1 3(a). 
Using gray coding in the 4-PAM data makes thermometer-to-bi- 
nary conversion easier as well. 

Comparison versus the reference voltage is performed in the 
preamplification stage. Since a differential signaling scheme is 
used, only one reference voltage value is required to differ- 
entiate among the four input levels, as shown in Fig. 13(b). 
Also shown is how this reference voltage is applied to the three 
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Fig. 16. Test setup for BER measurements. 

preamplifiers. The middle stage is a balanced differential com- 
parator, and the two other stages are unbalanced by two transis- 
tors, which are controlled by Vief . To balance the output capac- 
itive loading of the preamplifiers, identical dummy transistors, 
with grounded gates, are placed on the opposite branch of the 
differential pair. 



IV. Measurements 

Fig. 14 shows three different simulated eye diagram at 10 
Gb/s after the 10-m coaxial cable (PE-142LL). Fig. 14(a) 
is without transmit preemphasis and receive equalization, 
Fig. 14(b) is with preemphasis alone, and Fig: 14(c) is both 
preemphasis and equalization applied. The improvement in the 
eye diagram in these three conditions shows the necessity of 
the two filters. 

The actual transmitter achieves a symbol rate of 5 Gsym/s ( 1 0 
Gb/s) with an eye opening of 200 mV and 90 ps, and 4 Gsym/s 
(8 Gb/s) with an eye opening of 350 mV and HOpsover 10 m 
of coaxial cable, using preemphasis (Fig. 15). Symbols without 
preemphasis after the 10-m cable show an eye opening with 
60-mV height and 50-ps width at 4 Gsym/s. The transmitter 
output has an adjustable amplitude with a maximum of 1.2 V 
and a jitter of 1 1 ps (p-p) and 2 ps (rms). 

The BER measurements are performed using the test setup 
shown in Fig. 16. The PRBS encoder in the transmitter gener- 
ates a 1 0-Gb/s pseudorandom sequence that is sent over the line. 
The receiver detects the serial signal from the line and, after 
proper framing, sends it to the PRBS decoder. Whenever there 
is a bit error in the received sequence, the PRBS decoder gen- 
erates an error pulse. The number of these pulses per second is 
the system BER. The valid data window is measured by con- 
necting the receive and transmit PLL's to two clock sources, as 
shown in Fig. 16, and varying the delay of one clock source 
versus the other until a rapid increase occurs in BER. To set the 
reference voltage (Kef) for the receiver 2-bit ADC's (Fig. 13), 
a differential dc voltage equal to the reference voltage level of 
the 4-PAM data is applied to the receiver arid V TC f is adjusted 
manually to the point where the ADC's outputs toggle. The re- 
ceiver successfully detects an 8-Gb/s, 4-PAM data stream after 
10 m with a 3-V supply. At data rates higher than 8 Gb/s, the 
receive PLL fails due to increased high-frequency noise in the 
loop. Raising the supply to 3.3 V allows the receiver to perform 
up to 9 Gb/s. The decision logic of the data phase detector injects 
undesired charge onto the VCO control line, causing error in 
sampling clock phases and data detection. At 8 Gb/s over 10 
the receiver had a BER of 10~ 7 for a time window of 50 ps, 
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TABLE I 
PERFORMANCE SUMMARY 



Transmitter performance 




. Maximum transmitter rate 


lOGh/s @33V, 8Gbps © 3V 


Output jitter® 6Gbps 


Hps (p-p), 2ps (rms) 


Max. eye opening @8Gbps 


350mV t 11 Ops (10-m cable) 


Max. eye opening @ lOGbps 


200mV.90ps (10-m cable) 


Receiver performance 




Maximum receive rate 


9Cbps @3.3V,8Gbps @3V 


Data FIX jitter @ 8Gbps 


28ps (p-p), 4ps (rms) 


Data PLL capture range 


>20MHz 


Mia swing to capture lock 


±400mV 


Min. swing to maintain lock 


±300rhV 


Data PLL dynamics 


BW -35MHz, Ph.m. -50' 


rower dissipation @ 8Gbps, 3V 




Output driver 


220mW 


Analog (2 PLLs) 


750mW 


Input samplers and logic 


130mW 


Total 


HOOmW 



whereas at 6 Gb/s, the BER decreased to 10 15 for a window 
ofl50ps. 

Receiver equalization helps reduce the required transmitter 
preemphasis for the 10-m cable, effectively allowing the use 
of longer cables for the link. The receiver equalizer is adjusted 
manually, as is the transmitter preemphasis filter. However, as 
opposed to the transmitter output, the equalized waveform in 
the receiver cannot be viewed and used to set the optimized tap 
weight value. Therefore, the equalizer tap is adjusted to mini- 
mized the measured BER. 

The receiver data-recovery PLL requires that the input sym- 
bols have a minimum peak-to-peak swing of 800-mV differen- 
tial (400-mV swing on each line) to acquire lock and 600-mV 
differential swing to maintain lock. This PLL has a capture 
range of >20 MHz for a symbol stream with one transition per 
cycle (five symbols). The frequency acquisition circuit switches 
the loop control to the data phase detector when there is less 
than 100-kHz frequency difference between the transmitter and 
receiver reference clocks. The receive PLL has a jitter of 28 ps 
(p-p) and 4 ps (rms) when locked to the incoming data signal. 

The chip occupies 2 mm x 2 mm of die area. The transceiver 
die photo is shown in Fig. 17. Table I summarizes the transceiver 
chip performance. 




Fig. 17. Transceiver die photo. 



V. Conclusions 

Using parallelism, 4-PAM modulation, and analog transmit 
and receive FIR filters, data rates of over 8 Gpbs are achievable 
in conventional CMOS technology over long copper cables. Per- 
formance is further enhanced by a novel high-bandwidth linear 
data-recovery PLL with zero systematic offset that reduces the 
bit error rate due to random phase errors. A new frequency de- 
tector design guarantees frequency acquisition of the data-re- 
covery PLL under all process variations. 
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